Survey on Hadoop and Introduction to YARN

نویسندگان

  • Amogh Pramod Kulkarni
  • Mahesh Khandewal
چکیده

Big Data, the analysis of large quantities of data to gain new insight has become a ubiquitous phrase in recent years. Day by day the data is growing at a staggering rate. One of the efficient technologies that deal with the Big Data is Hadoop, which will be discussed in this paper. Hadoop, for processing large data volume jobs uses MapReduce programming model. Hadoop makes use of different schedulers for executing the jobs in parallel. The default scheduler is FIFO (First In First Out) Scheduler. Other schedulers with priority, pre-emption and non-pre-emption options have also been developed. As the time has passed the MapReduce has reached few of its limitations. So in order to overcome the limitations of MapReduce, the next generation of MapReduce has been developed called as YARN (Yet Another Resource Negotiator). So, this paper provides a survey on Hadoop, few scheduling methods it uses and a brief introduction to YARN. Keywords—Hadoop, HDFS, MapReduce, Schedulers, YARN.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Hadoop and MapReduce Framework

Hadoop, a Java Software Framework, supports data intensive data-intensive distributed applications. Hadoop is developed under open source license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop has formed framework for Big Data analysis. Its MapReduce technique made it more useful for huge amout of data processing. Hadoop is incorporated with cloud computi...

متن کامل

ABS-YARN: A Formal Framework for Modeling Hadoop YARN Clusters

In cloud computing, software which does not flexibly adapt to deployment decisions either wastes operational resources or requires reengineering, both of which may significantly increase costs. However, this could be avoided by analyzing deployment decisions already during the design phase of the software development. Real-Time ABS is a formal language for executable modeling of deployed virtua...

متن کامل

MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems

Many organizations—including academic, research, commercial institutions—have invested heavily in setting up High Performance Computing (HPC) facilities for running computational science applications. On the other hand, the Apache Hadoop software—after emerging in 2005— has become a popular, reliable, and scalable open-source framework for processing large-scale data (Big Data). Realizing the i...

متن کامل

Cluster management system design for big data infrastructures

ION OF HETEROGENEITY YARN creates containers on each machine based on the total memory and the number of CPU cores. If there are two machines with different memory size, then they will have different numbers of containers. In other words, unlike Hadoop, YARN takes resource heterogeneity into account, in the case of memory. However, YARN still does not consider heterogeneity in other resource ch...

متن کامل

EasyChoose: A Continuous Feature Extraction and Review Highlighting Scheme on Hadoop YARN

Today the Internet offers a massive amount of reviews and user experiences about a variety of products from different manufacturers, ranging from smartphones, automobiles, and home appliances to Internet services such as hotel booking and airplane booking. For a careful customer it is time-consuming to make good purchasing decisions due to a variety of similar products, lots of reviews for each...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014